neuron coverage
NCCR: to Evaluate the Robustness of Neural Networks and Adversarial Examples
Pu, Shi, Song, Fu, Wang, Wenjie
Neural networks have received a lot of attention recently, and related security issues have come with it. Many studies have shown that neural networks are vulnerable to adversarial examples that have been artificially perturbed with modification, which is too small to be distinguishable by human perception. Different attacks and defenses have been proposed to solve these problems, but there is little research on evaluating the robustness of neural networks and their inputs. In this work, we propose a metric called the neuron cover change rate (NCCR) to measure the ability of deep learning models to resist attacks and the stability of adversarial examples. NCCR monitors alterations in the output of specifically chosen neurons when the input is perturbed, and networks with a smaller degree of variation are considered to be more robust. The results of the experiment on image recognition and the speaker recognition model show that our metrics can provide a good assessment of the robustness of neural networks or their inputs. It can also be used to detect whether an input is adversarial or not, as adversarial examples are always less robust.
Exploring Adversarial Attacks on Neural Networks: An Explainable Approach
Renkhoff, Justus, Tan, Wenkai, Velasquez, Alvaro, Wang, illiam Yichen, Liu, Yongxin, Wang, Jian, Niu, Shuteng, Fazlic, Lejla Begic, Dartmann, Guido, Song, Houbing
Deep Learning (DL) is being applied in various domains, especially in safety-critical applications such as autonomous driving. Consequently, it is of great significance to ensure the robustness of these methods and thus counteract uncertain behaviors caused by adversarial attacks. In this paper, we use gradient heatmaps to analyze the response characteristics of the VGG-16 model when the input images are mixed with adversarial noise and statistically similar Gaussian random noise. In particular, we compare the network response layer by layer to determine where errors occurred. Several interesting findings are derived. First, compared to Gaussian random noise, intentionally generated adversarial noise causes severe behavior deviation by distracting the area of concentration in the networks. Second, in many cases, adversarial examples only need to compromise a few intermediate blocks to mislead the final decision. Third, our experiments revealed that specific blocks are more vulnerable and easier to exploit by adversarial examples. Finally, we demonstrate that the layers $Block4\_conv1$ and $Block5\_cov1$ of the VGG-16 model are more susceptible to adversarial attacks. Our work could provide valuable insights into developing more reliable Deep Neural Network (DNN) models.
Neuroevolutionary algorithms driven by neuron coverage metrics for semi-supervised classification
Santana, Roberto, Hidalgo-Cenalmor, Ivan, Garciarena, Unai, Mendiburu, Alexander, Lozano, Jose Antonio
In some machine learning applications the availability of labeled instances for supervised classification is limited while unlabeled instances are abundant. Semi-supervised learning algorithms deal with these scenarios and attempt to exploit the information contained in the unlabeled examples. In this paper, we address the question of how to evolve neural networks for semi-supervised problems. We introduce neuroevolutionary approaches that exploit unlabeled instances by using neuron coverage metrics computed on the neural network architecture encoded by each candidate solution. Neuron coverage metrics resemble code coverage metrics used to test software, but are oriented to quantify how the different neural network components are covered by test instances. In our neuroevolutionary approach, we define fitness functions that combine classification accuracy computed on labeled examples and neuron coverage metrics evaluated using unlabeled examples. We assess the impact of these functions on semi-supervised problems with a varying amount of labeled instances. Our results show that the use of neuron coverage metrics helps neuroevolution to become less sensitive to the scarcity of labeled data, and can lead in some cases to a more robust generalization of the learned classifiers.
DeepSensor: Deep Learning Testing Framework Based on Neuron Sensitivity
Jin, Haibo, Chen, Ruoxi, Zheng, Haibin, Chen, Jinyin, Liu, Zhenguang, Xuan, Qi, Yu, Yue, Cheng, Yao
Despite impressive capabilities and outstanding performance, deep neural network(DNN) has captured increasing public concern for its security problem, due to frequent occurrence of erroneous behaviors. Therefore, it is necessary to conduct systematically testing before its deployment to real-world applications. Existing testing methods have provided fine-grained criteria based on neuron coverage and reached high exploratory degree of testing. But there is still a gap between the neuron coverage and model's robustness evaluation. To bridge the gap, we observed that neurons which change the activation value dramatically due to minor perturbation are prone to trigger incorrect corner cases. Motivated by it, we propose neuron sensitivity and develop a novel white-box testing framework for DNN, donated as DeepSensor. The number of sensitive neurons is maximized by particle swarm optimization, thus diverse corner cases could be triggered and neuron coverage be further improved when compared with baselines. Besides, considerable robustness enhancement can be reached when adopting testing examples based on neuron sensitivity for retraining. Extensive experiments implemented on scalable datasets and models can well demonstrate the testing effectiveness and robustness improvement of DeepSensor.
Automated Testing for Deep Learning Systems with Differential Behavior Criteria
In this work, we conducted a study on building an automated testing system for deep learning systems based on differential behavior criteria. The automated testing goals were achieved by jointly optimizing two objective functions: maximizing differential behaviors from models under testing and maximizing neuron coverage. By observing differential behaviors from three pre-trained models during each testing iteration, the input image that triggered erroneous feedback was registered as a corner-case. The generated corner-cases can be used to examine the robustness of DNNs and consequently improve model accuracy. A project called DeepXplore was also used as a baseline model. After we fully implemented and optimized the baseline system, we explored its application as an augmenting training dataset with newly generated corner cases. With the GTRSB dataset, by retraining the model based on automated generated corner cases, the accuracy of three generic models increased by 259.2%, 53.6%, and 58.3%, respectively. Further, to extend the capability of automated testing, we explored other approaches based on differential behavior criteria to generate photo-realistic images for deep learning systems. One approach was to apply various transformations to the seed images for the deep learning framework. The other approach was to utilize the Generative Adversarial Networks (GAN) technique, which was implemented on MNIST and Driving datasets. The style transferring capability has been observed very effective in adding additional visual effects, replacing image elements, and style-shifting (virtual image to real images). The GAN-based testing sample generation system was shown to be the next frontier for automated testing for deep learning systems.
DeepEvolution: A Search-Based Testing Approach for Deep Neural Networks
Braiek, Houssem Ben, khomh, Foutse
The increasing inclusion of Deep Learning (DL) models in safety-critical systems such as autonomous vehicles have led to the development of multiple model-based DL testing techniques. One common denominator of these testing techniques is the automated generation of test cases, e.g., new inputs transformed from the original training data with the aim to optimize some test adequacy criteria. So far, the effectiveness of these approaches has been hindered by their reliance on random fuzzing or transformations that do not always produce test cases with a good diversity. To overcome these limitations, we propose, DeepEvolution, a novel search-based approach for testing DL models that relies on metaheuristics to ensure a maximum diversity in generated test cases. We assess the effectiveness of DeepEvolution in testing computer-vision DL models and found that it significantly increases the neuronal coverage of generated test cases. Moreover, using DeepEvolution, we could successfully find several corner-case behaviors. Finally, DeepEvolution outperformed Tensorfuzz (a coverage-guided fuzzing tool developed at Google Brain) in detecting latent defects introduced during the quantization of the models. These results suggest that search-based approaches can help build effective testing tools for DL systems.
DeepGauge: Comprehensive and Multi-Granularity Testing Criteria for Gauging the Robustness of Deep Learning Systems
Ma, Lei, Juefei-Xu, Felix, Sun, Jiyuan, Chen, Chunyang, Su, Ting, Zhang, Fuyuan, Xue, Minhui, Li, Bo, Li, Li, Liu, Yang, Zhao, Jianjun, Wang, Yadong
Deep learning defines a new data-driven programming paradigm that constructs the internal system logic of a crafted neuron network through a set of training data. Deep learning (DL) has been widely adopted in many safety-critical scenarios. However, a plethora of studies have shown that the state-of-the-art DL systems suffer from various vulnerabilities which can lead to severe consequences when applied to real-world applications. Currently, the robustness of a DL system against adversarial attacks is usually measured by the accuracy of test data. Considering the limitation of accessible test data, good performance on test data can hardly guarantee the robustness and generality of DL systems. Different from traditional software systems which have clear and controllable logic and functionality, a DL system is trained with data and lacks thorough understanding. This makes it difficult for system analysis and defect detection, which could potentially hinder its real-world deployment without safety guarantees. In this paper, we propose DeepGauge, a comprehensive and multi-granularity testing criteria for DL systems, which renders a complete and multi-faceted portrayal of the testbed. The in-depth evaluation of our proposed testing criteria is demonstrated on two well-known datasets, five DL systems, with four state-of-the-art adversarial data generation techniques. The effectiveness of DeepGauge sheds light on the construction of robust DL systems.
A White-Box Testing Model For Deep Learning Systems
How do you find errors in a system that exists in a black box whose contents are a mystery even to experts? That is one of the challenges of perfecting self-driving cars and other deep learning systems that are based on artificial neural networks--known as deep neural networks--modeled after the human brain. Inside these systems, a web of neurons enables a machine to process data with a nonlinear approach and, essentially, to teach itself to analyze information through what is known as training data. When an input is presented to a "trained" system--like an image of a typical two-lane highway shown to a self-driving car platform--the system recognizes it by running an analysis through its complex logic system. This process largely occurs inside a black box and is not fully understood by anyone, including a system's creators. Any errors also occur inside the black box and are thus difficult to identify and fix.